Adding new document types to be recognized by
nndoc isn't difficult. You just have to whip up a
definition of what the document looks like, write a predicate
function to recognize that document type, and then hook into
nndoc.
First, here's an example document type definition:
(mmdf
(article-begin . "^\^A\^A\^A\^A\n")
(body-end . "^\^A\^A\^A\^A\n"))
The definition is simply a unique name followed by a series of regexp pseudo-variable settings. Below are the possible variables—don't be daunted by the number of variables; most document types can be defined with very few settings:
first-articlenndoc will skip past all text
until it finds something that match this regexp. All text
before this will be totally ignored.article-beginarticle-begin-function
instead of this.article-begin-functionarticle-begin.head-beginhead-begin-function instead of this.head-begin-functionhead-begin.head-endbody-beginbody-begin-function instead of this.body-begin-functionbody-begin.body-endbody-end-function instead of this.body-end-functionbody-end.file-beginfile-endSo, using these variables nndoc is able to
dissect a document file into a series of articles, each with a
head and a body. However, a few more variables are needed since
not all document types are all that news-like—variables
needed to transform the head or the body into something that's
palatable for Gnus:
prepare-body-functionarticle-transform-functiongenerate-head-functiongenerate-article-functiondissection-functionfirst-article,
article-begin,
article-begin-function, head-begin,
head-begin-function, head-end,
body-begin, body-begin-function,
body-end, body-end-function,
file-begin, and file-end.Let's look at the most complicated example I can come up with—standard digests:
(standard-digest
(first-article . ,(concat "^" (make-string 70 ?-) "\n\n+"))
(article-begin . ,(concat "\n\n" (make-string 30 ?-) "\n\n+"))
(prepare-body-function . nndoc-unquote-dashes)
(body-end-function . nndoc-digest-body-end)
(head-end . "^ ?$")
(body-begin . "^ ?\n")
(file-end . "^End of .*digest.*[0-9].*\n\\*\\*\\|^End of.*Digest *$")
(subtype digest guess))
We see that all text before a 70-width line of dashes is
ignored; all text after a line that starts with that
‘^End of’ is
also ignored; each article begins with a 30-width line of dashes;
the line separating the head from the body may contain a single
space; and that the body is run through
nndoc-unquote-dashes before being delivered.
To hook your own document definition into nndoc,
use the nndoc-add-type function. It takes two
parameters—the first is the definition itself and the
second (optional) parameter says where in the document type
definition alist to put this definition. The alist is traversed
sequentially, and
nndoc-type-type-p is called
for a given type type. So
nndoc-mmdf-type-p is called to see whether a
document is of mmdf type, and so on. These type
predicates should return nil if the document is not
of the correct type; t if it is of the correct type;
and a number if the document might be of the correct type. A high
number means high probability; a low number means low probability
with ‘0’ being
the lowest valid number.